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Abstract 

This paper develops a Bayesian procedure for estimation and forecasting of the volatil- 
ity of multivariate time series. The foundation of this work is the matrix-variate dynamic 
linear model, for the volatility of which we adopt a multiplicative stochastic evolution, 
Ph ' using Wishart and singular multivariate beta distributions. A diagonal matrix of discount 

M— I , factors is employed in order to discount the variances element by element and therefore 

qh' allowing a flexible and pragmatic variance modelling approach. Diagnostic tests and se- 

quential model monitoring are discussed in some detail. The proposed estimation theory 
is applied to a four-dimensional time series, comprising spot prices of aluminium, copper, 
lead and zinc of the London metal exchange. The empirical findings suggest that the pro- 
' posed Bayesian procedure can be effectively applied to financial data, overcoming many 

of the disadvantages of existing volatility models. 
' Some key words: Time scries, volatility, multivariate, dynamic linear model, Bayesian, 

forecasting, state space, Kalman filter, GARCH, London metal exchange. 
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59 ■ 1 Introduction 

O 

In the last two decades, multivariate time series have received considerable attention with 
^ , the emphasis being placed on state space models (Liitkepohl, 1993, West and Harrison, 1997, 

^ \ Chapter 16; Durbin and Koopman, 2001, Chapter 3; De Gooijer and Hyndman, 2006). From 

an econometrics standpoint time-varying volatility models have been widely developed, recog- 
nizing the essence that the volatility and the correlation of assets change over time. Although 
univariate volatility models are useful in estimating and forecasting volatility, it is widely 
recognized (Bauwens et al, 2006) that multivariate models, which can model the serial and 
cross correlation of the assets, should be employed. 

From a time series standpoint, volatility models are developed within two main families of 
models: the multivariate generalized autoregressive conditional heteroskedastic (MGARCH), 
including the multivariate ARCH, and the multivariate stochastic volatility (MSV) families. 
Multivariate ARCH models include the diagonal vech model (Bollerslev et al, 1988), the 
constant conditional correlation model (Bollerslev, 1990), the factor- ARCH model (Engle et 
al, 1990), the BEKK model (Engle and Kroner, 1995) and the latent factor ARCH model 
(Diebold and Nerlove, 1989); see also Wong and Li (1997), Tse and Tsui (2002), Comte and 
Lieberman (2003), and Audrino and Barone-Adesi (2006). MSV models have also received 
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a lot of attention, see e.g. Harvey et al. (1994), Jacquier et al. (1995), Kim et al. (1998), 
Pitt and Shephard (1999), Aguiliar and West (2000) and Meyer et al. (2003). A number 
of estimation procedures have been suggested for MSV models; for instance, see Bauwens 
et al. (2006), Yu and Meyer (2006), Liesenfeld and Richard (2006), Asai et al. (2006) 
and Maasoumi and McAleer (2006). In this context, several variations of computationally 
expensive Markov chain Monte Carlo (MCMC) methods are commonly used following papers 
by Shephard (1993), Jacquier et al. (1994), Kim et al. (1998), Shephard and Pitt (1997), 
Uhlig (1997), Chib et al. (2002) and Philipov and Glickman (2006a, 2006b). 

Most of the proposed models are aimed at specific applications, or they impose restrictions 
in the parameter space, or they are only available for data with low dimensionality. In 
particular, it would be desirable to obtain estimation algorithms, for which the model would 
estimate not only the volatility covariance matrix, but also shocks in the levels of the returns. 
In addition to that, it is desirable to construct a model that will not rely on Monte Carlo or 
any other simulation procedures and also will not target data of specific applications. 

In this paper we develop a general state space model, which allows the volatility covariance 
matrix to be estimated with a fast Bayesian algorithm. The proposed algorithm is achieved 
by considering a stochastic multiplicative model for the volatility, which is based on Wishart 
and singular multivariate beta distributions. A diagonal matrix of degrees of freedom is 
used in a variance discounting approach in order to update the estimates and the forecasts 
of the volatility from time t — 1 to time t. This has a unique advantage that different 
volatilities can be discounted at different rates, for example one can have two assets, the 
volatility of the first changes at a rate according to a discount factor of 0.7 and the volatility 
of the second changes at a slower rate according to a discount factor of 0.95. The algorithm 
is fast and provides not only one-step ahead forecasts of the volatility, but also the entire 
one-step ahead forecast distribution of the volatility. A Bayesian algorithm is outlined for 
sequential model comparison. The proposed methodology is illustrated by considering data, 
consisting of spot prices of aluminium, copper, lead and zinc from the London metal exchange. 
It is found that the volatilities of aluminium and zinc prices are driven from a common 
factor and the volatilities of copper and lead prices are driven from another factor, while the 
respective correlations are around ±0.5. The performance of the model is discussed by using 
several diagnostic toolkits, including the mean of squared standardized forecast errors, the 
log-likelihood function and Value-at-Risk. 

The paper is organized as follows. Section [2] defines the model, for which inference is 
developed in Section [3l Section [J] discusses diagnostic tests and model comparison, and 
the following section analyzes data from the London metal exchange market. In Section 
[6] we discuss the advantages of the proposed approach as compared with existing GARCH 
estimation procedures. The appendix gives full mathematical details (including the proofs) 
of arguments in Sections [3] and HI 

2 Matrix- Variate Dynamic Linear Models 

Matrix-variate dynamic linear models (MV-DLMs) are introduced in Quintana and West 
(1987) and they are further developed in Salvador et al. (2003), Salvador and Gargallo 
(2004), Salvador et al. (2004), Triantafyllopoulos and Pikoulas (2002) and Triantafyllopoulos 
(2006a); matrix-variate DLMs are reported in some detail in West and Harrison (1997, §16.4). 
For the purpose of this paper the discussion is restricted to vector-valued time series; the 
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general description for matrix- valued time series can be found in Salvador et al. (2003). We 
should note that from a frequentist standpoint, MV-DLMs have been developed in Harvey 
(1986, 1989), Harvey and Snyder (1990), and Fernandez and Harvey (1990). Suppose that 
the p-dimensional response vector yt follows a matrix-variate DLM so that 

y', = FlQt + e't and = GtOt^i + ujt, (1) 

where Ft is a d-dimensional design vector, Gt is a d x d evolution matrix and Qt is a d x p 
state matrix. Conditional on S^, the innovations et and tot follow, respectively, multivariate 
and matrix-variate normal distributions, i.e. 

et IS* ~ MpxiiO, St) and uJt\^t ~ ^fdxpiO, ^t, S*), 

where T,t is the unknown p x p volatility covariance matrix of the innovations et , and 0,t is a 
d X d covariance matrix of the innovation ujf The distribution of uJt\T,t may also be written 
as 

vec{ujt)\^t ~ ^fdpxl{o, St (g) r^t), 

where vec(-) denotes the column stacking operator of a matrix and denotes the Kronecker 
product. It is assumed that the innovation series {et} and {uJt} are internally and mutually 
uncorrelated and also they are uncorrelated with the assumed priors 

@o\Y,o ^ Mdxp{mo,Po,T,o) and T,o ^ IWp{no + 2p, Sq), (2) 

for some known mo, Pq, tiq and Sq. Here S ~ IWp{k,S) denotes the inverted Wishart 
distribution with k degrees of freedom and parameter matrix S with density function 

= — FT ' ' ,,,, etr — 5S-^ , k>2p, 

' rp{(A:-p-l)/2}|S|fc/2 V 2 

where Tp[-) denotes the multivariate gamma function, etr(-) denotes the exponent of a trace 
of a matrix, and \S\ denotes the determinant of 5. Then follows the Wishart distribution 
Wp{k — p — 1, 5""*^). Let be a positive integer and write = {yi, y2i • • • > Vt} the information 
set comprising observations up to time t, for t = 1,2, . . . , N . The covariance matrix Qt is 
specified with at most d discount factors 61,62, ■■■ ,6^ so that 

St_i (E)nt = Var {vec (^A^/^GtQt-i\^t-i,y''^) } , 

where A = diag{(l — 6i)/6i, . . . , {1 — 6d)/6(i}- Thus i^t is the implied covariance matrix 
obtained after discounting is used in order to increase the covariance matrix from time t — 1 
to time t, given information The above equation justifies that 

St_i Of = Var{(/p ® AV2Gf)vec(ef_i)|Sf_i} = {Ip A^/^Gt){J:t^i ^ Pt~i){Ip ® G^A^/^) 
= T^t-i^A'/^GiPt-iG'tA^'^ 

implying = A^/^GfPf-iGf A^/^, where Pt-i is the left covariance matrix of Gj_i|y*~^, so 
that Yit-i® Pt~i = Var{vec(0j_i)|St_i, (in Section[3]it is shown that Pt-i is calculated 
routinely) . It is proposed that the above setting for Vtt is carried out for the covariance matrix 
Var{vec(wf)|Sf} = Tit^^t- This setting, which generalizes the single discounting approach of 
West and Harrison (1997), is necessary to consider in order to retain conjugate forms in the 
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updating of the posterior distribution of G(|S(,y* (see Section [3|). Multiple discount factors 
are useful in capturing the different structural characteristics of trend, seasonal and regression 
coefficient elements of the evolution matrix i}f. 

The volatility matrix Sf imposes complications in inference, but, it is a very useful consid- 
eration in the model because in financial time series, high frequency data exhibit short-term 
or long-term heteroscedastic behaviour. In the remainder of this section we describe the 
stochastic model governing the evolution of S^. 

At time t — 1 we assume that conditional on y^~^, follows an inverted Wishart 

distribution, ZyVp{n + 2p,St-i), for some n and St~i- The precision matrix is 

indicated by $t = and, following a Choleski decomposition, we write $t = Cj-Ct, where 
Ct is the unique upper triangular matrix of the Choleski decomposition. The law governing 
the evolution of the Tit or from time t — 1 to time t is represented by 

<^t = r'^^CUBtCt^iP~'/\ (3) 

where f3 = diag(/9i , , • • • , (3p) is diagonal matrix of discount factors /3i,/32, ■ ■ ■ , Pp and Bt, 
which given is assumed to be independent of is a random matrix following the 

singular multivariate beta distribution with parameters {p~^tv{f3)n+p—l) /2 and 1/2; we write 
~ Bp{{p~hr{p)n + p-l)/2, 1/2}. In Section [3] we will see that n = 1/(1 -p~Hr(/3)). 
Of course n is defined for < /3i < 1, for i = I, . . . ,p. For more details on the singular 
multivariate beta distribution the reader is referred to Uhlig (1994), Diaz-Garcia and Gutierrez 
(1997), and Srivastava (2003). 

The evolution ([3]) is motivated from the univariate case {p = 1), for which ([3]) reduces to 

$f = r^^t^iBf (4) 

In this case the multivariate singular beta reduces to a standard beta distribution and as Bt is 
independent of ^t-i, we have E{^t\y^~^) = ^i^t-i\y^~^) and Var($t|y*~^) > Var(<I>t_i |y*~^), 
since < /3 < 1. This defines a random walk type evolution for The above evolution for a 
scalar volatility is studied in Harrison and West (1987), West and Harrison (1997, §10.8), 
and Triantafyllopoulos (2007). 

Returning to the case when p > I, suppose that Pi = ■ ■ ■ = (3p and so /5 = (3ilp. In this 
case the evolution ([3]) reduces to 

= /3^^CUBtCt-i, 

where < (3i < I. In Proposition[T]of the appendix, it is shown that TyVp{p~^tT{(3)n+ 
2p,py^St-iP^/^) or ~ Wp{(3in + p - l,/3r'5,-\) and so 

Ei^t\y'~') = Wm+p- l)(3^'Sr\ = (n + ^) S^\, (5) 

which is different than ¥.{^t-i\y^'~^) = {n + p — l)S^^i, unless Pi = 1. It follows that the 
random walk type evolution of ([4]) is retained for values of Pi close to 1, but otherwise 
the evolution ([3]) defines a shrinkage type evolution, for which E(<I>t|y*~^) > K{^t-i\y^~^)- 
Our empirical results of Section [5.21 show that the estimator of S^, which is generated from 
evolution ([3]), performs well for relatively high values of the discount matrix p. For p = 1, 
West and Harrison (1997, §10.8) suggest a slow evolution for which a discount factor close 
to 1 is proposed. In particular on page 361 of the above reference it is stated "We note that 
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practically suitable variance discount factors take values near unity, typically between 0.95 
and 0.99". This is in agreement with our proposal, in the general case of p > 1, so that the 
shrinkage effect in 1^ is small. However, our empirical results in Section [5.21 suggest that the 
modeller should allow for smaller values of the discount factors in the range of 0.6 and 0.99 so 
that shocks in the volatility can be estimated. The evolution (0) makes the assumption that 
all elements of are discounted at the same rate via the single discount factor j3i. Equation 
([3]) introduces a flexible evolution, where each of the diagonal elements of are discounted 
at different rate via the p discount factors Pi, . . . , Pp. 



3 Estimation 



Prom evolution ([3]) and Proposition [T] of the appendix, the prior density of S^ly* ^ is the 
inverted Wishart density 



~ IWp{p-hv{p)n + 2p,p^/^St-iP'^^), 



(6) 



where n = 1/(1 — p~^tv(P)). 

Without loss in clarity of the presentation, we denote by p{X) the probability density 
function of a random matrix X, avoiding to explicitly write px{X). Thus if X and Y denote 
two different random matrices, p{X) and p(Y) denote respectively the densities of X and Y. 

Prom model ([I]), given and y*~^, the joint distribution of y'-f. and @t is 



y't 



t-i 



(a!+l)xp 



Ft^tFt FlRt 
RtFt Rt 



(7) 



where Rt = GtPt-iG[ + ^It and the covariance of y'^ and Qt is determined by 

Cov{vec(2/0,vec(Gt)|St,y*-i} = Cov{vec(F/et + 4), vec(et)|Et, y*-!} 

= Cov{(l ® F/)vec(et),vec(Gt)|St,y*-i} 

= (l0F/)Var{vec(et)|St,y*-n 

= il(g)Fl){J:t^Rt) = ^t(^{FlRt). 

Prom d?]) and the inverted Wishart prior ^ it follows that the joint forecast density of y't 
and 0(, given only y^~^ is a p x 1 multivariate Student t density (see Theorem 4.2.1 of Gupta 
and Nagar, 1999, §4.2), i.e. 



Vt 



with density 



.t-i 



FlGtmt^i 
Gtmt-i 



~ Tp^i\^p-hiiP)n+p-l, 

P'/'^St^iP^^A =Tp^i{u,M,U,S), 



FlRtFt FlRt 
RtFt Rt 



p{T\y 



Tp{{i^ + d + p- l)/2} p/2,^,(^+p-i)/2 
7r<ip/^Tp{{u + p-l)/2}^ ' ' ' 

x\S+{T- MyU~\T - M)|-('^+'^+P-i)/^ 



where T' = [yt 
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Applying Bayes' theorem, the posterior distribution of results to be an inverted 

Wishart. To detail the derivations of this result we need to note that the likelihood function 
of from the single observation yt is L(Sj; y^) = p{y^\T,t, whilst the prior of is given 

by dni). Thus the posterior of given y* is 



oc 



p{y't\y^ 1) p{y't\y^ ^) 



-(p-hr{l3)n+l+2p)/2 



etr 



which is proportional to the inverted Wishart distribution IWp{n* + 2p, St), with 

St = p^/^St-iP^/^ + etQt^e't, n* = p-Hr(/3)n + 1, (8) 

where ej = yt—yt-i{^) = yt—fn't-iG'tFt is the one-step forecast error and Qt = FlRtFt+1. The 
recursions of mt and Pt are calculated routinely, by writing down the posterior distribution 
of 6t|St, y*, i.e. jHf , y* ~ Mpxi{mt, Pti ^t), where from an application of the Kalman filter, 
we have mt = Gtmt-i + RtFtQt^e't and Pt = Rt - RtFtQt^F[Rt. 

The second parameter of the singular multivariate beta distribution (see Lemma [1] in 
Appendix), denoted by g, needs to satisfy two requirements (a) 2q must be positive integer 
number and (b) p~^tr(/3)n+p must equal n+p — 1. (a) is needed for the singular multivariate 
beta distribution to be defined (Uhlig, 1994) and (b) is needed for the distribution of the prior 
Wishart of $(|y*~^ (see Proposition [1] in the appendix). These two requirements result to the 
adoption of the prior 

1 

11 = 

l-p-itr(/3)' 

where (3 may be close, but not equal to Ip. With the above prior of n, the degrees of freedom 
of equation ([8]) become 

n* = p"Hr(/3)n + 1 = \ = n. 

I — p ^tr{P) 

Define rt = yt — m'^Ft, the residual error vector. Then we have that 

n = et{Ig - {RtFtQt^yFt} = etQt\Qt - F^RtFt) = etQt^- 
From this, it follows that equation ([8]) can be written as St = P^^'^St-iP^^'^ + r^e^, or 

St = /?*/25o/3*/2 + £ /?'/V,_,e;_,/?^/2. (9) 

The posterior expectation of St is E(Sj|y*) = St/in - 2) = {1 - p~hi{p))St/i2p~^tr{P) - 1), 
for p~^tr(/3) > 1/2. From equation ([6]), the one-step forecast mean of is E(St|y*~"^) = 
(1 -p-Hr(/3))/3i/25j_i/3V2/(3p-itr(/3) - 2), for p-Hr(/3) > 2/3. 
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The above estimation procedure is valid for < /3j < 1, while from equation ([3]) if 
(3i = (^2 = ■ ■ ■ = Pp = I, then = ^t-i and the volatility is unchanged from t — 1 to t. 
Note that if /3i = /32 = • • • = /3p we have /? = f3ilp and in this special case all elements 
of St are discounted in the same rate. The advantage of employing the discount matrix /? 
is that different elements of the volatility estimator can be discounted at different rate. 
For example for p = 2, one can set (3 = diag(l,0.9), so that with Sf = {crij,t)i,j =1,2, the 
variance au^t has constant volatility, but the variance o"22,f is discounted at a rate according 
to a discount factor of 0.9. The situation /3 = Ip, is leading to a time-invariant volatility 
St = S, for all t, and this is usually impractical. In this case, the posterior distribution of S 
is the inverted Wishart S|y* ~ TWp{nQ + t + 2p, St), with St = Sq + Yll=i "^i^'i^ where no are 
the initial degrees of freedom. In the next result we relate the above posterior estimate St 
with the maximum likelihood estimator of S. First note that conditional on S, the posterior 
distribution of Qt is 

Gt|S,y* ~A/'dxp(mt,Pt,S). (10) 
Then we have the following result. 

Theorem 1. In the MV-DLM |7P suppose that, for all t, St = S, and so conditional on 
S, the posterior distribution of @t is given by equation ^0\). Then the maximum likelihood 
estimator of S, based on data = {yi, y2, ■ ■ ■ , Vn}, is 

1 ^ 

t=i 

where et = yt — m[_-^G'tFt is the one-step forecast error vector and rt = yt — m'^Ft is the 
residual error vector. 

For no = and Sq = 0, the estimator of S, which results from the above inverted Wishart 
prior is Sn = J2tLi ^t^'t = S^r and so the posterior estimator of S equals to the maximum 
likelihood estimator of S. However, when St is a time-dependent volatility matrix, a similar 
procedure for the maximum likelihood estimator of St is not available in closed form and so the 
above sequential Bayesian estimation procedure is thought to be advantageous and preferable 
as opposed to approximate likelihood estimation procedures (Durbin and Koopman, 2001). 
The log-likelihood function when St is time-dependent is given in Theorem [2] of the next 
section. 

4 Model Diagnostics and Model Comparison 

From equation ([6]) we have that the one-step forecast mean of St is E(St|y*~^) = (1 — 
p-hi{f3))(3^/'^St-il3^/^/i3p-^ti{(3) - 2), where p-Hr(/3) > 2/3. The one-step forecast error 
distribution is a p-variate t distribution, i.e. 

et\y''' ~ Tp^k, 0, Qt/3'/'5t_i/3^/2) 

where k = p~^tr(/3)/(l — p'-'^tr (/?)). Note that the condition p~^tr{P) > 2/3 ensures that 
k > 2, hence, given y*"^, the covariance matrix of et exists. By defining 

ut = {Qlf'^t = {{k - 2)Qi^p~'I^S^_\p~^I^Yl^et, 
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the one-step standardized forecast errors, we obtain 

ut|y*-i ~Tpxi{fc,0, {k-2)Ip}, 

where (QJ')^''^ denotes the square root of Q^, based on the Choleski decomposition, or based 
on the spectral decomposition. From this it follows that E(tij|y*~^) = and Var(ut|y*~^) = 
E(?/tti^|y*~"^) = Ip and so, by writing ut = [uu U2t ■ • • Upt]' , one measure of goodness of fit is 
the mean of squared standardized one-step forecast errors (MSSE), defined by 

1 ^ 

t=i 

which should be close to [1 1 • • • 1]', if the model produces a good fit to the data. Of 
course when P = Ip, the above t distributions can not be defined, since tr(/?) = p. In this 
case we have et\y'^~^ ~ ^xi(f^o + i — 1, 0, Qi-S't-i) and then, with kt = uq + t — 1, we get 
^t|y*~^ ~ '^pxi{kt,0, {kt — 2)/p} and hence all other definitions remain unchanged. Other 
measures of goodness of fit are the mean absolute one-step forecast errors (MAE) and mean 
error (ME), defined, respectively, by 

1 ^ 1 ^ 

MAE = J7Y1 [mod(eit) mod(e2t) • • • mod(ept)]' and ME = — ^ ej, 

t=i t=i 

where et = [eu e2t • ■ ■ ept]' and mod(ej() denotes the modulus of eu, for i = 1, 2, . . . ,p. 

Another method of model diagnostics and model comparison is based on the Value-at- 
Risk (VaR), which in laid words is the amount of money of an asset that one expects to 
lose with some probability over a certain time horizon. There are several ways of calculating 
the VaR of a portfolio, but here we mention only the most popular, which is termed as the 
variance-covariance approach and it is due to Morgan (1996). The VaR of a portfolio has a 
single value (under a specific model), which according to Brooks and Persand (2003) is 

VaR(iV,a) = fiN + F^^^il - a/100)aN, 

where VaR(A^, a) is the VaR of a portfolio at time N and percentage significance level a, 
F/v(-) is the distribution function of the standardized portfolio returns {zn — ^n)/(^n-, and 
cT^ is the conditional volatility of z^- For known weights wi,. . . ,Wp satisfying Wi > and 
Yli=i'^i = 1> we define the portfolio returns zt = Yl\=i'^i^'i,t s° its volatility is = 
Yui=iWiO'ii,t + '^^i^jWiWjaij^t, where St = icrij)ij=i^2,...,p- For their internal evaluation 
of market risk, investment banks typically use 95% significance levels, leading to less tight 
evaluation of VaR, i.e. the resulting from VaR amount of money will cover 95% of probable 
loses. The Basel Committee on Banking Supervision (1996, 1998) uses a tight 99% confidence 
percentage to ensure coverage of 99% losses. Clearly VaR(A'', 0.95) < VaR(A^, 0.99), since there 
is needed more money to cover larger proportion of probable loses. More details on VaR and 
its evaluation may be found in Tsay (2002, Chapter 7) and Chong (2004). 

Another measure of goodness of fit, is based on the evaluation of the log-likelihood func- 
tion, as a means of model design (e.g. choosing values of the discount matrices A and /?) and 
model comparison. The next result gives an expression of the log-likelihood function. 
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Theorem 2. In the MV-DLM ([7P denote with . . . , Sat; y^) the log-likelihood func- 

tion of Si, S2, . . . , Sat, based on data = {yi, 2/2, • • • , Un}- Then it is 



^ t=i 

+plog \Lt \ -\- {m - p - 2) log \T,t 



plogQt + ip- m) log |St-i| + ejQj ^Sj 

t=i ' 



anc 



c = ^("^^ P) ^ log ^. ^ jY log Tp{{m + l)/2} - log 2 - p TV log tt - iV log Tp{m/2) , 

i=l 

where j3 = diag{l3i, f32, ■ ■ ■ ,Pp), = P~^tKP)/{^ ~ +P — 1 o-nd Lt is the diagonal 

matrix with diagonal elements the positive eigenvalues of Ip — (Cj'„i)~^/3^/^S^"'^/3^/^Cjl\, with 

Note that if /? = Ip, then T,t = T,, for all t, and the log-likelihood function of S reduces to 

nN n ^ N 1 ^ 

y^) = -P- log(27r) - | ^ log - - log - - ^ ^Q^'^-'et. (11) 

The log-likelihood function of Theorem [2] is clearly provided conditional on the values of A 
and P and so, replacing St by St/{n — 2) (the posterior mean of S^) in the log-likelihood, 
one way to choose these values is by maximizing the log-likelihood over a range of candidate 
values for A and [5. 

In model comparison, the log-likelihood function is particularly useful, as it can be used 
forming likelihood ratios in order to compare and contrast the performance of two mod- 
els. A similar idea can be implemented by considering sequential model monitoring, for 
which, two models are compared by using sequential Bayes' factors of the standardized errors 
ui,U2, ■ . ■ ,U]\f . Following the ideas of West and Harrison (1997, Chapter 11) and Salvador 
and Gargallo (2004), we consider two models A4i and Ai2, which differ in some quantitative 
form, e.g. in the values of the discount matrices, and by writing all densities conditional on 
these two models, we form the log Bayes' factor 



LBF{t) = log — — I , . . . , 



t = l,2,...,N. 



Then, at time t, Mi is in favour of M2 (equiv. M2 is in favour of Mi), if LBF{t) > 
(equiv. LBF{t) < 0), while when LBF[t) = 0, the two models are equivalent, in the sense 
that they produce similar forecasts and similar standardized forecast errors. Some algorithms 
have been proposed in the literature about how the above test can be done efficiently. Some 
work includes Monte Carlo simulation (Salvador et al, 2004), some work is restricted in 
the case of a time-invariant volatility matrix (Salvador and Gargallo, 2004) and most of the 
work refers to univariate processes (West and Harrison, 1997, Salvador and Gargallo, 2004, 
2005, 2006). Triantafyllopoulos (2006b) proposes a general procedure, according to which, 
a modified exponentially weighted moving average control chart is applied to the univariate 
process {LBF{t)}t=i^2,...,N and control signals indicate model preference. 
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The above ideas of model comparison, based on Bayes' factors, can also be applied to 
the problem of sequential monitoring of a single model. This approach, which is explored in 
detail in West and Harrison (1997, Chapter 11) and in Salvador and Gargallo (2004, 2005, 
2006), proposes the adoption of a set of alternative models, compares the current model with 
these and makes a sequential decision adopting the best model, according to the behaviour 
of the Bayes' factor. 



5 Example: The London Metal Exchange Data 
5.1 Description of the Data 

The London metal exchange (LME) is the world's premier non-ferrous metals market, with 
highly liquid contracts. Its trading customers may be metal industries or individuals (sellers 
or buyers). The metals currently traded in the exchange are: aluminium, copper grade A, 
standard lead, primary nickel, tin, and zinc. More details about the LME can be found on 



its web site: http : //www . Ime .co.uk 



The importance of the LME and its operations has recently invited considerable interest. 
Here, from a statistical point of view, we mention the work of McKenzie et al. (2001) and 
the review of Watkins and McAleer (2004). Triantafyllopoulos (2006a) gives a brief account 
to the statistical work on the LME. 

In this paper we concentrate on spot prices of four metals exchanged in the LME, namely 
aluminium, copper, lead and zinc. We have 4 variables of interest collected in the observation 
vector ut = [yu y2t Vst Uit]' ■ Each variable comprises the spot price per tonne of metal: yu 
is the spot variable, which indicates the daily/current ask price per tonne of aluminium; the 
remaining three variables are the relevant spot ask prices of copper, lead and zinc, respectively. 
The data are collected for every trading day from 4 January 2005 to 28 April 2006, and are 
plotted in Figured) After excluding week-ends and bank holidays, there are N = 334 trading 



days. The data have been obtained from the LME web site: http://www.lme.co.uk 



5.2 Statistical Analysis 

Here we consider the compound return time series {xt}t=i.2,...,333 with xt^i = log y t — logy t~i, 
for t = 2,3, . . . , 334. Most of the current literature in econometrics is focused on modelling 
only the volatility of the series, but for the MV-DLMs considered in this paper, one can model 
with the same model the returns (for forecasting purposes) and estimate the volatility matrix. 
We use the model 



xt = nt + et, fit = Q'tF, St = Qt-i + uJt, (12) 

where et ~ A^4xi(0,Sj), tot ~ N2x4:{0,^t,'^t) and nt is the level of the series at time t. The 
design vector F = [1 0]' is invariant of time and a random walk evolution for the states Qt has 
been chosen, which is suitable for modelling the compound returns (Tsay, 2002, Cuaresma 
and Hlouskova, 2005). The volatility of the series is measured with the volatility matrix St, 
which is subject to estimation. There might be some uncertainty on the dimension d of the 
rows of Qt, but here for parsimonious modelling we choose a low value for d. It might be 
worthwhile to consider d as random, but this can add computational delays to the estimation 
process. The 2x2 evolution covariance matrix ^It can be specified with two discount factors 
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Figure 1: LME data, consisting of aluminium (solid line), copper (dashed line), lead (dotted 
line) and zinc (dashed-dotted line) spot prices (in US dollars per tonne of each metal). 

6i and 82 according to the discussion in Section [2j However, it can be seen that since F is 
time invariant model (jl2|) can be decomposed as 

xt = fit + et, Ht = fJ-t-i + Ct, Ct ~ A'pxi(0, F'^ltFT^t), 

which is a random walk plus noise model. Since F'QtF = (1 — 6i)pii^t-i/Si, where Pt = 
iPij,t)i,j=i,2,3,4, it can be seen that only 61 has a contribution to the model and in particular 
model ()12p is equivalent to a model with a single discount factor, i.e. $7^ = (1 — 6)Pt-i/5 and 
6 = 61. So there are five discount factors of interest: 6, which is the discount factor responsible 
for the random walk evolution of the level nt, and f3i, (32, f33, Pa, which are responsible for the 
evolution of the 4x4 volatility matrix E^; f3i is the discount factor for the volatility of the 
compound series Xi, where (3 = [f3i P2 Ps Pa]' and xt = [xu X2t a^st x^t]' ■ We specify the priors 

GolSo ~A/'2x4(0, 1000/2, So), /xo|$]o~AA4xi(0,1000So), So ~ XW4(n + 8, 14), 

where n = 1/(1 - 4"Hr(/3)). 

Table [1] shows two performance measures, namely the MSSE and the log-likelihood func- 
tion (see Section H]). Two values of 6 are picked and compared with; a small value 5 = 0.08 
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(corresponding to an adapting, but not smooth evolution for the level fit) and a high value 
S = 0.8 (corresponding to a smooth evolution for the level fj-t). The ME was found to be 
constant throughout the range of /?, but changing for the two values of 5; for 6 = 0.08 it was 
ME = [0.04 -0.18 0.00 0.05]' and for 6 = 0.8 it was ME = [0.21 1.22 0.17 0.01]'. From Figure 
[T] it is apparent that the aluminium and the zinc evolve together (their difference appears to 
be in their levels) and likewise the copper and the lead evolve together. This can be reflected 
in our model by choosing Pi = (5^ and /?2 = /^s so that the volatilities of say aluminium and 
zinc will be similar. Table [1] shows the two performance measures (MSSE and LogL) for a 
range of admissible values of /3i and /?2, given that tr(/3)/4 > 2/3 so that the one-step forecast 
mean of exists (see Section [3|). For all (5 ^ I4, and for 5 = 0.08, the log-likelihood function 
is maximized for /3i = /32 = /?3 = A = 0.2 (LogL = —11421.3), but this value can not be 
allowed, because (0.2 + 0.2 + 0.2 + 0.2)/4 = 0.2 < 2/3. The highest value of LogL is achieved 
for j3i = 0.65 and (52 = 0.7, but this produces poor performance in the MSSE. Our choice 
is for f5i = 0.66 and (32 = 0.9, returning reasonable values of the MSSE and a not very low 
value for the LogL. When comparing the performance of the models for the discount factors 
5 = 0.08 and 5 = 0.8, we note that the log-likelihood function corresponding to 5 = 0.8 is 
smaller than that of 5 = 0.08. Similarly the ME produced with 6 = 0.8 is too large and the 
MSSE does not achieve a decent value for all four variables. Therefore we conclude that a 
high discount factor 5 should not be chosen. 

Table [U also reveals that a choice of (3i = (32 = (^3 = (3^ is inadequate, leading to poor 
performance in the MSSE. It is clear that there are two main factors driving the volatilities 
of the metals and these factors are expressed here by the two discount factors (3i and (32 ■ The 
log-likelihood function for /3 = [1 1 1 1]' (e.g. when St = S), is —00 when the formula of 
Theorem [2] is used (due to the infinity at the value of m), but this likelihood is just -10344.66 
when formula (llip is used. The fact that this log-likelihood appears to be the maximum 
likelihood, is due to the fact that in the likelihood pT]) the part of logp(Si|Sf_i) does not 
appear. Likelihood (ITT]) should only be used when there is strong evidence to suggest that 
the volatility is constant, which clearly is not the case in this data set. 

Table [2] shows the evaluation of VaR based on the variance-covariance approach (see 
Section U]) for several values of (3i and (32 and for 5 = 0.08, 5 = 0.8 and 5 = 1. Typically 
a 95% confidence level is used by investment banks and a 99% confidence level is used by 
the Basle Committee (Chong, 2004). 5 = 1 refers to a time-invariant level = which is 
adopted in many MGARCH type models (Bauwens et al, 2006), while 5 = 0.8 generates a 
time-dependent, but smooth level, and 5 = 0.08 generates a highly adaptive time-dependent 
level fit- Table [2] shows that, for the same parameters of /3, the VaR using 5 = 1 and 5 = 0.8 
are larger as compared with 5 = 0.08. Within 5 = 0.08, the parameters (3i = 0-7,(32 = 0.8, 
Pi = 0.66, P2 = 0.9, 0.9, P2 = 1 and Pi = P2 = 1 result to the best models. From Tables [I 
and [2] we suggest that the overall best model is this with Pi = 0.66, /32 = 0.9, producing not 
very low log-likelihood function, a decent MSSE and a relatively low values of VaR. 

Figure [2] shows the one-step forecast of the volatilities (diagonal elements of Tit) and 
Figure [3] shows the respective forecasts of the correlations of Sj. Figure [2] illustrates that 
the volatilities of aluminium and zinc have a similar pattern and the volatilities of copper 
and lead have a similar pattern. Copper and zinc appear to be the most volatile and this is 
expected if we look at Figure [H where the trends of copper and zinc are less smooth than 
those of aluminium and zinc. Figure [3] confirms that the aluminium and the zinc are more 
correlated than the aluminium and the lead. This figure also indicates that the correlations 
are not very high in modulus. 
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Table 1: Mean square one-step forecast standardized errors (MSSE) and log-likelihood func- 
tion (LogL) evaluated at the posterior mean St/{n — 2), where /? = [Pi (32 (32 Pi]'- 
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Table 2: 95% and 99% VaR values of the portfolio of xn = [xi^n X2._n x^^n x^^n]' , for several 
values of ;3 = diag(/3i, /32, f32,Pi), for 6 = 0.08, <5 = 0.8 and 5 = 1. 
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From Figure [T]we can clearly see that the aluminium and the zinc are locally co-integrated 
of order 1, and the copper and lead are also locally co-integrated of order 1. Here we use 
the term locally co-integrated of order d to indicate that a linear combination of each of the 
two variables are, after d steps of integration, locally stationary (in the sense that for a time 
period, known also as regime the time series is weakly stationary). The aluminium and the 
copper are not co-integrated and the same applies for the copper and zinc. This fact is 
apparent in the volatilities (Figure [2|) and in the model this is reflected by the choice of two 
distinct elements in the discount matrix /3, i.e Pi = (34, and 02 = 03- There are two distinct 
factors driving the volatilities of the four metals and a factor volatility model could be applied 
to reduce the complexity (Aguilar and West, 2000, Tsay, 2002, §9.4). 

6 Discussion 

This paper develops a new Bayesian procedure for estimation and forecasting of multivariate 
volatility. It is proposed that the evolution of the unknown volatility covariance matrix is 
modelled with a multiplicative stochastic model, based on Wishart and singular multivariate 
beta distributions. The resulting algorithm is capable to estimate the volatility element by 
element. This is achieved by employing variance discounting using several discount factors 
and thus allowing different volatilities to be discounted at different rates. 

In the last two decades many models have been developed for multivariate volatility esti- 
mation (see Section [T]). Here we provide a discussion of the advantages of our proposal com- 
pared to the multivariate GARCH (MGARCH) models, reviewed in Bauwens et al. (2006). 
Some of the MGARCH models result as generalizations of univariate GARCH models (e.g. 
the VEC, the constant-correlation GARCH, and the BEKK models, see also Section[T]). From 
these models the constant-correlation GARCH model makes the strong and usually unrealistic 
assumption of a constant correlation matrix, whilst the VEC and even the BEKK have too 
many parameters to estimate. The large number of parameters to be estimated, restrict these 
models to applications of relatively low dimensions, usually not exceeding p = 3. The factor 
GARCH models (e.g the factor-BEKK model) overcomes this difficulty, but in practice the 



14 





50 150 250 

(a): Aluminium and Zinc 



1 I I I I I r 

50 150 250 

(b): Copper and Lead 



Figure 2: One-step forecasts of the volatility of aluminium (solid line of panel (a)), zinc 
(daslied-dotted line of panel (a)), copper (dashed line of panel (b)) and lead (dotted line of 
panel (b)). 



specification of the factors is not simple (Tsay, 2002, §9.4). The dynamic-correlation models 
(Bauwens et al, 2006; Audrino and Barone-Adesi, 2006) aim to combine the flexibility of the 
constant-correlation GARCH, but to overcome the main drawback of that model by introduc- 
ing a specific time-dependent structure on the correlation matrix. This can be done in several 
ways, but its main drawback is that, if the dimension of the parameters is to be manageable, 
the correlation matrix is driven by scalar parameters, which means that all correlations have 
the same weight of change. Perhaps, this is not a major issue for bivariate time series data, 
but for higher dimensions it is unlikely to hold true. In our MV-DLM model we overcome this 
problem by introducing the matrix of discount factors /3 and by discounting the volatilities 
and the corresponding correlations at different rates. 

The usual setting of a MGARCH model is that of yt = fit + where fit is the level 
of the series yt (usually the series will be the compound returns of some assets or exchange 
rates), with et being the innovation series, following et\T,t ~ A/'pxi(0, St) and represents 
the volatility matrix subject to estimation. While it is recognized that the volatility can 
affect the level, in some MGARCH studies the level is time- invariant (Bauwens et al, 2006), 
and in some other studies the level is assumed to have a simple evolution, e.g. to follow 
an autoregressive model of order one (Audrino and Barone-Adesi, 2006). In the latter case 
estimation is usually performed separately in the AR and GARCH components, which may 
not be desirable for on-line forecasting. Our proposed model does in fact allow for much 
more complicated structure in fit, through fit = Q't^t and through the evolution equation of 
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Figure 3: One-step forecast of the correlation of aluminium, copper, lead and zinc. Panel 
(a) shows the correlation of aluminium with copper (solid line), the correlation of aluminium 
with lead (dashed line) and the correlation of aluminium with zinc (dotted line); panel (b) 
shows the correlation of copper with lead (solid line) and the correlation of copper with zinc 
(dashed line); panel (c) shows the correlation of lead with zinc. 
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Qt, see equation ([T]). This can include structural characteristics such as trend and seasonal 
components and applying the principle of superposition of state space models (West and 
Harrison, 1997, Chapter 6), one can build complex multivariate time series models, for which 
estimation of the states is accompanied by simultaneous estimation of the volatility. This is 
not achievable, by neither ARIMA type models, nor by MGARCH models alone. In order 
to build such models one has to consider a multivariate ARIMA model, with errors following 
MGARCH models. In such models there are inferential problems regarding to estimation and 
in the literature simple models have been considered; for a univariate discussion on this topic 
see Fiorentini and Maravall (1996) and Audrino and Barone-Adesi (2006). 

An important issue, which is discussed in Bauwens et al. (2006), is that of marginalization. 
If yt follows a MGARCH model, the question is whether = Ayt, follows the same type of 
MGARCH model, where A is a. q x p matrix of constants. This is an important problem, 
because if the model is closed under linear transformations (or else if it is invariant under 
linear transformations), then one can easily study the volatility of a linear combination of 
some assets, for example to estimate the volatility of a portfolio and hence the value at risk 
of a portfolio. As pointed out in Bauwens et al. (2006) not all GARCH models are invariant 
under the above linear transformation. The MV-DLM is invariant, under some regulatory 
assumptions. Consider model ([T]) and define A a q x p matrix of rank q. Then, we can write 

iy;y = FiQ; + {e;y, e: = Gteu + ^h e,* ~aa,xi(o,s*), ~AA,x,(o,Oi,s*), 

where = @tA' , el = Aet, = AT,tA', u>l = uJtA' and the remaining components of the 
model is as in ([1]). Although, in the above model we can not obtain an explicit formula for the 
precision ^'t = {AT,tA')~^ , it is clear that using distribution theory, we can establish that the 
linear transformation y^ = Ayt follows a MV-DLM with dimensions q and d. For example, we 
can readily see that from the posterior Htly* ~ TWp{n + 2p, St) we have S*|(2/*)* ~ IWq{n* + 
2q, AStA'), where n* = n + 2{p — q). It follows that all scalar yu, with yt = [yu y2t • ■ ■ Vpt]', 
follow univariate DLMs of the form of West and Harrison (1997, §10.8) and the posterior 
distributions of the diagonal elements au^t, C22,i, • • • , fpp,i of = {crij^t)i,j=i,2,...,p are inverted 
gamma. 

The Bayesian estimation approach of the MV-DLMs is preferred to the usual maximum 
likelihood estimation approach of most of the MGARCH models or to Bayesian estimation 
based on Monte Carlo simulation. The proposed Bayesian approach is delivered in closed 
form and thus it is available for on-line estimation. In the maximum likelihood estimation 
approach, adopted in many MGARCH models, given a sample, the aim is to estimate a set of 
parameters, sometimes a reasonably large number of them and sometimes the maximization 
will be computationally expensive and time consuming. This procedure may not be suitable 
for sequential application, since the parameters and their estimates seem to lose one of their 
dynamic power, which is to adapt and to update as new information comes in. Our model 
is adaptive to new information and it is computationally cheap, which makes it suitable for 
volatility estimation of high dimensional data. 
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Appendix 



In this appendix we detail the proofs of arguments in Sections [3] and HI We begin with the 
prior distribution 

Proposition 1. Consider model (OP with the priors (0j and the evolution Let the 

posterior precision at time t — 1 be ^ 'Wp{n + p — l,S^\). Then, with the 

prior degrees of freedom n = 1/(1 — p~^tr{(])), the prior distribution of is ^t\y^~^ 
Wp{p-Hr{P)n + p - 1, /3-i/25-\/5-i/2). 

The proof is a direct consequence of the model assumptions and Theorem 1 of Uhlig 
(1994). From the above proposition, the prior ^ is obtained from = $7^. 

Proof of TheoremUl The proof of the maximization of the log-likelihood function requires 
matrix-differentiation, in particular, first and second order differentiation in terms of E. Here 
we follow the matrix-differentiation notation of Harville (1997) and the proof mimics the early 
work on log-likelihood maximization of Harvey (1986, 1989, §8.3). An alternative proof can 
be obtained by employing the log-likelihood maximization procedure, used for VAR models, 
of Lutkepohl (1993, pages 80-82). 

With the posterior (fTOl) . the forecast distribution of is ~ J\fpxi{'m[_^G[Ft, Qt^), 

where Qt = Fj-RtFt + 1 and mt-\ and Rt are defined in Section [3l The log likelihood of S is 

N N 

my"") = lognp(yi|S,y*-^) = -^log(27r)-|5^1ogQi--log|S| 

i=l i=l 



1 ^ 

2 - FlGtmt-i)QT^^-\yt - m[_^G[Ft). (A-1) 



t=i 

Taking the first derivative of £{T,;y^) we get 

dmy"") ^ N dlog |S-iri 1 ^ d{{yi - FiGtmt^,)Qi^T.~\yt - m,„iG^F,)} 
2 SS-i 2^ 

N 

= AS- -^diag{fTii, 0-22,..., CTpp} 



and this leads to 



N , N 



t=i t=i 



since 



rt = yt- m[Ft = yt- m'^^^G[Ft - ctA'^Ft = (1 - A[Ft)et = Q;\Qt - F[Pt-iFt/ 5)et = Qi^et- 

To prove that the second partial derivative of ^(S; y^) with respect to S is a negative definite 
matrix, first we show that the second partial derivative of ^(E; y^) with respect to is a 
negative definite matrix. Denote with Dp the duplication matrix (i.e. vec(S) = DpVech(E), 
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where vech(-) is the column stacking operator of a lower portion of a symmetric matrix) 
and write Hp to be any left inverse of Dp (i.e. HpDp = Ip). One choice for Hp is Hp = 
(DpDp)~^ Dp. For any vector a, let diag(a) denote the diagonal matrix with diagonal elements 
the elements of a. Write a = vech(S) and a* = vech(S~^). From equation (\A-2\i we have 

_ iV c>vech(diag{(7ii , ^22, . . . , cipp}) 



, -NHpii: ® ^)Dp , 

= -NHp{ll®Y.)Dp 



N 9vech(diag{(Tii , 0-22, • • • , o'pp}) da 



= -NHp{^ ® T.)Dp + ydiag{vech(/p)}i7p(S ® T.)Dp 
N 

= -y[2/p(p+i)/2-diag{vech(/p)}]i7p(S^S)i?p<0, (A-3) 

which is a negative definite matrix, since both 2/p(p_|_i)/2 ~ diag{vech(/p)} and Hp{T, T,)Dp 
are positive definite. 

Now using the chain rule for matrix differentiation we have 



dada' da^a'^ \ da' J da'^ dada' 

and at S = S^v we have that 



dada' 



s=E„ da^a'. 



E=E» 



/ da, 
\d^ 



<0' 

S=Sjv 



which from ()A-3p is a negative definite matrix and so S^v maximizes the log-likelihood function 

Before we prove Theorem [21 we give the following lemma. 

Lemma 1. Suppose that the pxp matrix B follows the singular multivariate beta distribution 
B ~ Bp{m/2,n/2), with density 

^ (n2-pn)/2 rp{(m + n)/2} (^_p_i)/2| ,(^_p-i)/2 

' Tn{n/2)Tp{m/2y ' ' ' 

where n is a positive integer, m > p — 1, Ip — B = HiKH'^, K is the diagonal matrix with 
diagonal elements the positive eigenvalues of Ip — B, and Hi is a matrix with orthogonal 
columns, i.e. HiH'^ = Ip. For any non-singular matrix A, the density of X = AB~~^A', is 

p(X) = rp{(m + re)/2} fa_„^i)/2,^,_(m_,p_i)/2^ 

r„(n/2)rp(m/2) 

where L is the diagonal matrix including the positive eigenvalues of Ip — A'X~^A. 

Proof. First note that X is a non-singular matrix and \B\ = \A\'^\X\~^ . From Diaz-Garcfa 
and Gutierrez (1997), the Jacobian of B with respect to X is 

(dB) = |A'|(P-"+l)/2|^|-(p-n+l)/2|^|n^^^)^ 
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where K is defined as in the theorem. Then from the singular multivariate beta density of B 
we obtain 

V(X) = ;p("'~P»)/2 rp{(m + re)/2} .^.n^j^\(n^p~l)/2\-^\(-m^p~^l)/2 

^ ' Tn{n/2)Tp{m/2y ' ' ' ' ' 

x|_^|(p-n+l)/2|^|-(p-n+l)/2^ 

from which we immediately get the required density of X. □ 
Proof of Theorem\^ First we derive the likelihood function S2, . . . , Sat; y^). We have 

L(i;i,i;2,...,S7v;y^) = 2/2, • ■ ■ , yA^lSi, S2, . . . , Sat) 

= p{yN\^N,y^~^)p{yi,y2, ■ ■ . ,yN~i\^i,^2, ■ ■ -^^n)- 

By Bayes' theorem the last part of the right hand side is 

piyi,y2,--- ,yAr_ijSi, 1)2, • • • ,^n) (x p{Y,N\^N~i,y'^~^)p{yi,y2, ■ ■ ■ ,yN-~i\^i,^2, - ■ ■ ,^n-i) 
and so applying the last equation repeatedly we have 

TV 

S2, . . . , J^N^y"") = c* l[p{yt\^t, y'-'). (A-4) 

t=i 

The density p{yt\T,t,y^~^) is a multivariate normal density, since from the Kalman filter 
yt\T,t,y^"^ ~ N'pxi{m[__iG[Ft,QtT.t). The density p{T.t\T.t-i,y*~^) is the density p{X) of 
Lemma □ with A = I3^^^C~\, T^t = C^^{C^^)', m = p-Hr(/3)/(l - p-hx{(3)) + p-l and 
n = 1. The required formula of the log-likelihood function is obtained from ()A-4p by taking 
the logarithm of ^2, ■ ■ ■ , ^N]y^), for c* = 1. □ 
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